Search CORE

24 research outputs found

Exponential Family Matrix Completion under Structural Constraints

Author: Ghosh Joydeep
Gunasekar Suriya
Ravikumar Pradeep
Publication venue
Publication date: 15/09/2015
Field of study

We consider the matrix completion problem of recovering a structured matrix from noisy and partial measurements. Recent works have proposed tractable estimators with strong statistical guarantees for the case where the underlying matrix is low--rank, and the measurements consist of a subset, either of the exact individual entries, or of the entries perturbed by additive Gaussian noise, which is thus implicitly suited for thin--tailed continuous data. Arguably, common applications of matrix completion require estimators for (a) heterogeneous data--types, such as skewed--continuous, count, binary, etc., (b) for heterogeneous noise models (beyond Gaussian), which capture varied uncertainty in the measurements, and (c) heterogeneous structural constraints beyond low--rank, such as block--sparsity, or a superposition structure of low--rank plus elementwise sparseness, among others. In this paper, we provide a vastly unified framework for generalized matrix completion by considering a matrix completion setting wherein the matrix entries are sampled from any member of the rich family of exponential family distributions; and impose general structural constraints on the underlying matrix, as captured by a general regularizer

\mathcal{R}(.)

. We propose a simple convex regularized

M

--estimator for the generalized framework, and provide a unified and novel statistical analysis for this general class of estimators. We finally corroborate our theoretical results on simulated datasets.Comment: 20 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Mining structured matrices in high dimensions

Author: Gunasekar Suriya
Publication venue
Publication date: 17/11/2016
Field of study

Structured matrices refer to matrix valued data that are embedded in an inherent lower dimensional manifold with smaller degrees of freedom compared to the ambient or observed dimensions. Such hidden (or latent) structures allow for statistically consistent estimation in high dimensional settings, wherein the number of observations is much smaller than the number of parameters to be estimated. This dissertation makes significant contributions to statistical models, algorithms, and applications of structured matrix estimation in high dimensional settings. The proposed estimators and algorithms are motivated by and evaluated on applications in e--commerce, healthcare, and neuroscience. In the first line of contributions, substantial generalizations of existing results are derived for a widely studied problem of matrix completion. Tractable estimators with strong statistical guarantees are developed for matrix completion under (a) generalized observation models subsuming heterogeneous data--types, such as count, binary, etc., and heterogeneous noise models beyond additive Gaussian, (b) general structural constraints beyond low rank assumptions, and (c) collective estimation from multiple sources of data. The second line of contributions focuses on the algorithmic and application specific ideas for generalized structured matrix estimation. Two specific applications of structured matrix estimation are discussed: (a) a constrained latent factor estimation framework that extends the ideas and techniques hitherto discussed, and applies them for the task of learning clinically relevant phenotypes from Electronic Health Records (EHRs), and (b) a novel, efficient, and highly generalized algorithm for collaborative learning to rank (LETOR) applications.Electrical and Computer Engineerin

Texas ScholarWorks

Inductive Bias of Multi-Channel Linear Convolutional Networks with Bounded Weight Norm

Author: Gunasekar Suriya
Jagadeesan Meena
Razenshteyn Ilya
Publication venue
Publication date: 24/02/2021
Field of study

We study the function space characterization of the inductive bias resulting from controlling the

\ell_2

norm of the weights in linear convolutional networks. We view this in terms of an induced regularizer in the function space given by the minimum norm of weights required to realize a linear function. For two layer linear convolutional networks with

C

output channels and kernel size

K

, we show the following: (a) If the inputs to the network have a single channel, the induced regularizer for any

K

is a norm given by a semidefinite program (SDP) that is independent of the number of output channels

C

. We further validate these results through a binary classification task on MNIST. (b) In contrast, for networks with multi-channel inputs, multiple output channels can be necessary to merely realize all matrix-valued linear functions and thus the inductive bias does depend on

C

. Further, for sufficiently large

C

, the induced regularizer for

K=1

and

K=D

are the nuclear norm and the

\ell_{2,1}

group-sparse norm, respectively, of the Fourier coefficients -- both of which promote sparse structures

arXiv.org e-Print Archive

The Implicit Bias of Gradient Descent on Separable Data

Author: Gunasekar Suriya
Hoffer Elad
Nacson Mor Shpigel
Soudry Daniel
Srebro Nathan
Publication venue
Publication date: 28/12/2018
Field of study

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.Comment: Final JMLR version, with improved discussions over v3. Main improvements in journal version over conference version (v2 appeared in ICLR): We proved the measure zero case for main theorem (with implications for the rates), and the multi-class cas

arXiv.org e-Print Archive

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

Author: Even Mathieu
Flammarion Nicolas
Gunasekar Suriya
Pesme Scott
Publication venue
Publication date: 25/10/2023
Field of study

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp characterisation leads to qualitative insights about the impact of stochasticity and stepsizes on the recovered solution. Specifically, we show that large stepsizes consistently benefit SGD for sparse regression problems, while they can hinder the recovery of sparse solutions for GD. These effects are magnified for stepsizes in a tight window just below the divergence threshold, in the "edge of stability" regime. Our findings are supported by experimental results

arXiv.org e-Print Archive